Importance Sampling for Reinforcement Learning with Multiple

نویسندگان

  • Christian Robert Shelton
  • Tomaso Poggio
  • Christian Robert
چکیده

This thesis considers three complications that arise from applying reinforcement learning to a real-world application. In the process of using reinforcement learning to build an adaptive electronic market-maker, we find the sparsity of data, the partial observability of the domain, and the multiple objectives of the agent to cause serious problems for existing reinforcement learning algorithms. We employ importance sampling (likelihood ratios) to achieve good performance in partially observable Markov decision processes with few data. Our importance sampling estimator requires no knowledge about the environment and places few restrictions on the method of collecting data. It can be used efficiently with reactive controllers, finite-state controllers, or policies with function approximation. We present theoretical analyses of the estimator and incorporate it into a reinforcement learning algorithm. Additionally, this method provides a complete return surface which can be used to balance multiple objectives dynamically. We demonstrate the need for multiple goals in a variety of applications and natural solutions based on our sampling method. The thesis concludes with example results from employing our algorithm to the domain of automated electronic market-making. Thesis Supervisor: Tomaso Poggio Title: Professor of Brain and Cognitive Science This thesis describes research done within the Department of Electrical Engineering and Computer Science and the Department of Brain & Cognitive Sciences within the Center for Biological & Computational Learning and the Artificial Intelligence Laboratory at the Massachusetts Institute of Technology. This research was sponsored by grants from Office of Naval Research (DARPA) under contract No. N00014-00-1-0907, National Science Foundation (ITR) under contract No. IIS-0085836, National Science Foundation (KDI) under contract No. DMS9872936, and National Science Foundation under contract No. IIS-9800032. Additional support was provided by: Central Research Institute of Electric Power Industry, Center for e-Business (MIT), Eastman Kodak Company, DaimlerChrysler AG, Compaq, Honda R&D Co. Ltd., Komatsu Ltd., Merrill-Lynch, NEC Fund, Nippon Telegraph & Telephone, and Siemens Corporate Research, Inc., Toyota Motor Corporation, and The Whitaker Foundation. I know this thesis would have been more difficult and of lesser quality were it not for the patience and trust of my advisor, the flexibility and approachability of my committee, the academic and social support of the Al Lab graduate students, and the faith and love of my parents.

برای دانلود رایگان متن کامل این مقاله و بیش از 32 میلیون مقاله دیگر ابتدا ثبت نام کنید

ثبت نام

اگر عضو سایت هستید لطفا وارد حساب کاربری خود شوید

منابع مشابه

Importance sampling for reinforcement learning with multiple objectives

This thesis considers three complications that arise from applying reinforcement learning to a real-world application. In the process of using reinforcement learning to build an adaptive electronic market-maker, we find the sparsity of data, the partial observability of the domain, and the multiple objectives of the agent to cause serious problems for existing reinforcement learning algorithms....

متن کامل

Competitive-Cooperative-Concurrent Reinforcement Learning with Importance Sampling

The speed and performance of learning depend on the complexity of the learner. A simple learner with few parameters and no internal states can quickly obtain a reactive policy, but its performance is limited. A learner with many parameters and internal states may finally achieve high performance, but it may take enormous time for learning. Therefore, it is difficult to decide in advance which a...

متن کامل

Truncated Importance Sampling for Reinforcement Learning with Experience Replay

Reinforcement Learning (RL) is considered here as an adaptation technique of neural controllers of machines. The goal is to make Actor-Critic algorithms require less agent-environment interaction to obtain policies of the same quality, at the cost of additional background computations. We propose to achieve this goal in the spirit of experience replay. An estimation method of improvement direct...

متن کامل

Balanced Importance Sampling Estimation

In this paper we analyze a particular issue of estimation, namely the estimation of the expected value of an unknown function for a given distribution, with the samples drawn from other distributions. A motivation of this problem comes from machine learning. In reinforcement learning, an intelligent agent that learns to make decisions in an unknown environment encounters the problem of judging ...

متن کامل

Manifold-based multi-objective policy search with sample reuse

Many real-world applications are characterized by multiple conflicting objectives. In such problems optimality is replaced by Pareto optimality and the goal is to find the Pareto frontier, a set of solutions representing different compromises among the objectives. Despite recent advances in multi-objective optimization, achieving an accurate representation of the Pareto frontier is still an imp...

متن کامل

ذخیره در منابع من


  با ذخیره ی این منبع در منابع من، دسترسی به آن را برای استفاده های بعدی آسان تر کنید

عنوان ژورنال:

دوره   شماره 

صفحات  -

تاریخ انتشار 2014